NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Distributional Drift Detection in Medical Imaging with Sketching and Fine-Tuned Transformer

https://doi.org/10.1109/ICDH67620.2025.00013

Wu, Yusen; Nguyen, Phuong; Yesha, Rose; Yesha, Yelena (July 2025, IEEE)

Distributional drift detection is important in medical applications as it helps ensure the accuracy and reliability of models by identifying changes in the underlying data distribution that could affect the prediction results of machine learning models. However, current methods have limitations in detecting drift, for example, the inclusion of abnormal datasets can lead to unfair comparisons. This paper presents an accurate and sensitive approach to detect distributional drift in CT-scan medical images by leveraging data-sketching and fine-tuning techniques. We developed a robust baseline library model for real-time anomaly detection, allowing for efficient comparison of incoming images and identification of anomalies. Additionally, we fine-tuned a pre-trained Vision Transformer model to extract relevant features, using mammography as a case study, significantly enhancing model accuracy to 99.11%. Combining with data-sketches and fine-tuning, our feature extraction evaluation demonstrated that cosine similarity scores between similar datasets provide greater improvements, from around 50% increased to 99.1%. Finally, the sensitivity evaluation shows that our solutions are highly sensitive to even 1% salt-and-pepper and speckle noise, and it is not sensitive to lighting noise (e.g., lighting conditions have no impact on data drift). The proposed methods offer a scalable and reliable solution for maintaining the accuracy of diagnostic models in dynamic clinical environments.
more » « less
Free, publicly-accessible full text available July 7, 2026
Machine learning approach for the flexural strength of 3D ‐printed fiber‐reinforced concrete based on the meta‐heuristic algorithm

https://doi.org/10.1002/suco.70195

Khodadadi, Nima; Roghani, Hossein; De_Caso, Francisco; El‐kenawy, El‐Sayed M; Yesha, Yelena; Nanni, Antonio (August 2025, Structural Concrete)

Abstract The increasing demand for concrete in construction presents challenges such as pollution, high energy consumption, and complex structural requirements. Three‐dimensional printing (3DP) offers a promising solution by eliminating formwork, reducing waste, and enabling intricate geometries. Predicting the strength of 3D‐printed fiber‐reinforced concrete (3DP‐FRC) remains challenging due to the nonlinear nature of neural networks and uncertainty in optimizing key parameters. In this study, we developed machine learning models using five metaheuristic algorithms—arithmetic optimization algorithm, African Vulture Optimization Algorithm, flow direction algorithm, generalized normal distribution optimization, and Mountain Gazelle Optimizer—to optimize the weights and biases in a feed‐forward backpropagation network. Among all the algorithms, MGO demonstrated the best performance. To address data limitations, a data augmentation method combining Kernel density estimation and Wasserstein generative adversarial networks is employed. Sensitivity analysis using SHapley Additive exPlanations (SHAP) identifies the most influential input parameters. The proposed MGO‐ANN model enhances predictive accuracy, reducing the need for extensive laboratory testing. Additionally, a user‐friendly graphical user interface is developed to facilitate practical applications in estimating 3DP‐FRC flexural strength.
more » « less
Free, publicly-accessible full text available August 1, 2026
Evaluation of factors underlying differences in venous thromboembolism rates between Black and White patients

https://doi.org/10.1016/j.jvsv.2025.102270

Lin, Mary S; Sahoo, Shalini; Hayssen, Hilary; Mayorga-Carlin, Minerva; Englum, Brian; Siddiqui, Tariq; Nguyen, Phuong; Yesha, Yelena; Sorkin, John D; Lal, Brajesh K (September 2025, Journal of Vascular Surgery: Venous and Lymphatic Disorders)

Free, publicly-accessible full text available September 1, 2026
A composite risk assessment model for venous thromboembolism

https://doi.org/10.1016/j.jvsv.2024.101968

Lin, Mary Sixian; Hayssen, Hilary; Mayorga-Carlin, Minerva; Sahoo, Shalini; Siddiqui, Tariq; Jreij, Georges; Englum, Brian R; Nguyen, Phuong; Yesha, Yelena; Sorkin, John David; et al (January 2025, Journal of Vascular Surgery: Venous and Lymphatic Disorders)

Full Text Available
Data-driven PSO-CatBoost machine learning model to predict the compressive strength of CFRP- confined circular concrete specimens

https://doi.org/10.1016/j.tws.2024.111763

Khodadadi, Nima; Roghani, Hossein; De_Caso, Francisco; El-kenawy, El-Sayed M; Yesha, Yelena; Nanni, Antonio (May 2024, Thin-Walled Structures)

Full Text Available
Blockchains for Internet of Things: Fundamentals, Applications, and Challenges

https://doi.org/10.1109/MNET.2024.3410640

Wu, Yusen; Hu, Ye; Chen, Mingzhe; Yesha, Yelena; Debbah, Mérouane (January 2024, IEEE Network)

Full Text Available
Quantifying input data drift in medical machine learning models by detecting change-points in time-series data

https://doi.org/10.1117/12.3008771

Prathapan, Smriti; Samala, Ravi K; Hadjiyski, Nathan; D’Haese, Pierre-François; Maldonado, Fabien; Nguyen, Phuong; Yesha, Yelena; Sahiner, Berkman (April 2024, SPIE)
Astley, Susan M; Chen, Weijie (Ed.)
Devices enabled by artificial intelligence (AI) and machine learning (ML) are being introduced for clinical use at an accelerating pace. In a dynamic clinical environment, these devices may encounter conditions different from those they were developed for. The statistical data mismatch between training/initial testing and production is often referred to as data drift. Detecting and quantifying data drift is significant for ensuring that AI model performs as expected in clinical environments. A drift detector signals when a corrective action is needed if the performance changes. In this study, we investigate how a change in the performance of an AI model due to data drift can be detected and quantified using a cumulative sum (CUSUM) control chart. To study the properties of CUSUM, we first simulate different scenarios that change the performance of an AI model. We simulate a sudden change in the mean of the performance metric at a change-point (change day) in time. The task is to quickly detect the change while providing few false-alarms before the change-point, which may be caused by the statistical variation of the performance metric over time. Subsequently, we simulate data drift by denoising the Emory Breast Imaging Dataset (EMBED) after a pre-defined change-point. We detect the change-point by studying the pre- and post-change specificity of a mammographic CAD algorithm. Our results indicate that with the appropriate choice of parameters, CUSUM is able to quickly detect relatively small drifts with a small number of false-positive alarms.
more » « less
Full Text Available
Improving VTE Identification through Language Models from Radiology Reports: A Comparative Study of Mamba, Phi-3 Mini, and BERT

https://doi.org/10.1109/BIBM62325.2024.10822229

Deng, Jamie; Wu, Yusen; Yesha, Yelena; Nguyen, Phuong (December 2024, IEEE)

Full Text Available
CCS-GAN: COVID-19 CT Scan Generation and Classification with Very Few Positive Training Images

https://doi.org/10.1007/s10278-023-00811-2

Menon, Sumeet; Mangalagiri, Jayalakshmi; Galita, Josh; Morris, Michael; Saboury, Babak; Yesha, Yaacov; Yesha, Yelena; Nguyen, Phuong; Gangopadhyay, Aryya; Chapman, David (August 2023, Journal of Digital Imaging)

We present a novel algorithm that is able to generate deep synthetic COVID-19 pneumonia CT scan slices using a very small sample of positive training images in tandem with a larger number of normal images. This generative algorithm produces images of sufficient accuracy to enable a DNN classifier to achieve high classification accuracy using as few as 10 positive training slices (from 10 positive cases), which to the best of our knowledge is one order of magnitude fewer than the next closest published work at the time of writing. Deep learning with extremely small positive training volumes is a very difficult problem and has been an important topic during the COVID-19 pandemic, because for quite some time it was difficult to obtain large volumes of COVID-19-positive images for training. Algorithms that can learn to screen for diseases using few examples are an important area of research. Furthermore, algorithms to produce deep synthetic images with smaller data volumes have the added benefit of reducing the barriers of data sharing between healthcare institutions. We present the cycle-consistent segmentation-generative adversarial network (CCS-GAN). CCS-GAN combines style transfer with pulmonary segmentation and relevant transfer learning from negative images in order to create a larger volume of synthetic positive images for the purposes of improving diagnostic classification performance. The performance of a VGG-19 classifier plus CCS-GAN was trained using a small sample of positive image slices ranging from at most 50 down to as few as 10 COVID-19-positive CT scan images. CCS-GAN achieves high accuracy with few positive images and thereby greatly reduces the barrier of acquiring large training volumes in order to train a diagnostic classifier for COVID-19.
more » « less
Full Text Available
Active Semi-Supervised Learning via Bayesian Experimental Design for Lung Cancer Classification Using Low Dose Computed Tomography Scans

https://doi.org/10.3390/app13063752

Nguyen, Phuong; Rathod, Ankita; Chapman, David; Prathapan, Smriti; Menon, Sumeet; Morris, Michael; Yesha, Yelena (March 2023, Applied Sciences)

We introduce an active, semisupervised algorithm that utilizes Bayesian experimental design to address the shortage of annotated images required to train and validate Artificial Intelligence (AI) models for lung cancer screening with computed tomography (CT) scans. Our approach incorporates active learning with semisupervised expectation maximization to emulate the human in the loop for additional ground truth labels to train, evaluate, and update the neural network models. Bayesian experimental design is used to intelligently identify which unlabeled samples need ground truth labels to enhance the model’s performance. We evaluate the proposed Active Semi-supervised Expectation Maximization for Computer aided diagnosis (CAD) tasks (ASEM-CAD) using three public CT scans datasets: the National Lung Screening Trial (NLST), the Lung Image Database Consortium (LIDC), and Kaggle Data Science Bowl 2017 for lung cancer classification using CT scans. ASEM-CAD can accurately classify suspicious lung nodules and lung cancer cases with an area under the curve (AUC) of 0.94 (Kaggle), 0.95 (NLST), and 0.88 (LIDC) with significantly fewer labeled images compared to a fully supervised model. This study addresses one of the significant challenges in early lung cancer screenings using low-dose computed tomography (LDCT) scans and is a valuable contribution towards the development and validation of deep learning algorithms for lung cancer screening and other diagnostic radiology examinations.
more » « less
Full Text Available

« Prev Next »

Search for: All records